Menu

Can I force a certain page - After setURL?

Help
2012-05-24
2014-02-02
  • brundleseth

    brundleseth - 2012-05-24

    Excellent piece of framework, this. Intutive enough for even me to understand ;)

    I have a query, though:

    Is it possible to somehow force phpCrawl to visit certain page(s)? Ie. can I some how add certain pages to the crawl-list?

    Ie. if I start with $crawler->setURL("http://www.php.net"); but also in same session want to make sure I visit mysql.com or perhaps a certain sub-section of php.net that wuould not otherwise get crawled? (yes examples are very imaginary obviously ;-)

    Thanks in advance :)

     
    • Anonymous

      Anonymous - 2020-11-15
      Post awaiting moderation.
    • Anonymous

      Anonymous - 2021-04-28
      Post awaiting moderation.
  • Nobody/Anonymous

    Hey,

    my examples are WHAT? ;)

    And im sorry, there's no setting that let's you add one ore more URL's directly to the queue besides the one in setUrl().

    But feel free to add this request to the list of feature-requests: http://sourceforge.net/tracker/?group_id=89439&atid=590149

    This little workaround works as well:

    class MyCrawler extends PHPCrawler
    {
     [b] function initChildProcess()
      {
        $UrlDescriptor = new PHPCrawlerURLDescriptor("http://my-url-to-add.de");
        $this->LinkCache->addURL($UrlDescriptor);
      }[/b]
      function handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)
      {
        // ...
      }
    } 
    $crawler = new MyCrawler();
    $crawler->setUrl("php.net");
    // ...
    

    Hope i could help

     
  • brundleseth

    brundleseth - 2012-05-25

    No no - MY examples are very imaginary ;-) I was not talking down about those included :-D

    Thanks for your suggestion. Is initChildProcess only a part of multiprocessing approach? I'm not sure I fully understand if/how your example works.

    Is there actually an AddUrl function as in your example?

    Feature request is added :-)

    https://sourceforge.net/tracker/?func=detail&aid=3529802&group_id=89439&atid=590149

    And: Thanks for providing such excellent support!

     
  • Anonymous

    Anonymous - 2013-09-02

    Any news of this "add url"? Would be quite useful!

     

    Last edit: Anonymous 2014-11-19
  • Anonymous

    Anonymous - 2014-02-02

    Code above works good. Thanks

     

    Last edit: Anonymous 2014-11-22

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.